Model Selection

Vision-to-Text Conversion

# Vision-to-Text Conversion

Donut Base Finetuned Cord V2

Donut is a visual document understanding model based on Swin Transformer, specifically fine-tuned for the CORD dataset, capable of extracting structured text information from images.

Donut Base Receipt V3

Receipt recognition model fine-tuned based on naver-clova-ix/donut-base

Large Language Model

An image caption generation model based on the VisionEncoderDecoder architecture, capable of converting input images into natural language descriptions.

Donut Base Finetuned Cord V1 2560

Donut is an OCR-free document understanding Transformer model that combines a visual encoder with a text decoder to achieve image-to-text conversion.

Donut Base Finetuned Rvlcdip

Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder to process document images.

Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART).

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase